一切时代的艺术都在努力为我们内心那神圣的无声的欲望提供语言。 ——赫尔曼·黑塞《彼得·卡门青》
写在前面
学习K8s
,所以整理记忆
文章理论内容来源于:
《Kubernetes权威指南:从Docker到Kubernetes实践全接触》
第四版.第十一章
这里整理学习笔记
一切时代的艺术都在努力为我们内心那神圣的无声的欲望提供语言。 ——赫尔曼·黑塞《彼得·卡门青》
因为没有具体的Demo,所以文章有些空,类似于一些指导思想,读着乏味,这里先列出干货:一些查问题的网站,关于内容之后有机会在补充相关的案例,如果解决问题,时间紧张的小伙伴还是针对问题描述下面的平台里找找
Kubernetes集群中常见问题的排查方法
为了跟踪和发现在Kubernetes集群中运行的容器应用出现的问题,我们常用如下查错方法。
查看Kubernetes对象的当前运行时信息,特别是与对象关联的Event事件
。这些事件记录了相关主题
、发生时间
、最近发生时间
、发生次数
及事件原因
等,对排查故障非常有价值。通过查看对象的运行时数据
,我们还可以发现参数错误
、关联错误
、状态异常
等明显问题。由于在Kubernetes中多种对象相互关联,因此这一步可能会涉及多·个相关对象的排查问题。
对于服务、容器
方面的问题,可能需要深入容器内部
进行故障诊断
,此时可以通过查看容器的运行日志
来定位具体问题。
对于某些复杂问题,例如Pod调度这种全局性
的问题,可能需要结合集群中每个节点上的Kubernetes服务日志
来排查。比如搜集Master
上的kube-apiserver, kube-schedule, kube-controler-manager
服务日志,以及各个Node
上的kubelet, kube-proxy
服务日志.
查看系统Event 在Kubernetes集群
中创建Pod
后,我们可以通过kubectl get pods命令
查看Pod列表
,但通过该命令显示的信息有限。Kubernetes提供了kubectl describe pod
命令来查看一个Pod
的详细信息,例如:
通过kubectl describe pod
命令,可以显示Pod创建
时的配置定义、状态等信息
,还可以显示与该Pod
相关的最近的Event
事件,事件信息对于查错非常有用。
如果某个Pod一直处于Pending状态
,我们就可以通过kubectl describe
了解具体的原因:
没有可用的Node以供调度
,可能原因为pod端口冲突,或者受Taints
影响,。
开启了资源配额管理
,但在当前调度的目标节点上资源不足
。
镜像下载失败等
。
查看pod
详细信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl describe pods etcd-vms81.liruilongs.github.io -n kube-system Name: etcd-vms81.liruilongs.github.io Namespace: kube-system Priority: 2000001000 Priority Class Name: system-node-critical Node: vms81.liruilongs.github.io/192.168.26.81 Start Time: Tue, 25 Jan 2022 21:54:20 +0800 Labels: component=etcd tier=control-plane Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.81:2379 kubernetes.io/config.hash: 1502584f9ab841720212d4341d723ba2 kubernetes.io/config.mirror: 1502584f9ab841720212d4341d723ba2 kubernetes.io/config.seen: 2021-12-13T00:01:04.834825537+08:00 kubernetes.io/config.source: file seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 192.168.26.81 IPs: IP: 192.168.26.81 Controlled By: Node/vms81.liruilongs.github.io Containers: etcd: Container ID: docker://20d99a98a4c2590e8726916932790200ba1cf93c48f3c84ca1298ffdcaa4f28a Image: registry.aliyuncs.com/google_containers/etcd:3.5.0-0 Image ID: docker-pullable://registry.aliyuncs.com/google_containers/etcd@sha256:9ce33ba33d8e738a5b85ed50b5080ac746deceed4a7496c550927a7a19ca3b6d Port: <none> Host Port: <none> Command: etcd --advertise-client-urls=https://192.168.26.81:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://192.168.26.81:2380 --initial-cluster=vms81.liruilongs.github.io=https://192.168.26.81:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://192.168.26.81:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://192.168.26.81:2380 --name=vms81.liruilongs.github.io --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt State: Running Started: Tue, 25 Jan 2022 21:54:20 +0800 Last State: Terminated Reason: Error Exit Code: 255 Started: Mon, 24 Jan 2022 08:35:16 +0800 Finished: Tue, 25 Jan 2022 21:53:56 +0800 Ready: True Restart Count: 128 Requests: cpu: 100m memory: 100Mi Liveness: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s Startup: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s Environment: <none> Mounts: /etc/kubernetes/pki/etcd from etcd-certs (rw) /var/lib/etcd from etcd-data (rw) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: etcd-certs: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/pki/etcd HostPathType: DirectoryOrCreate etcd-data: Type: HostPath (bare host directory volume) Path: /var/lib/etcd HostPathType: DirectoryOrCreate QoS Class: Burstable Node-Selectors: <none> Tolerations: :NoExecute op=Exists Events: <none> ┌──[root@vms81.liruilongs.github.io]-[~] └─$
查看集群中的Node
节点和节点的详细信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 [root@liruilong k8s] NAME STATUS AGE 127.0.0.1 Ready 2d [root@liruilong k8s] Name: 127.0.0.1 Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=127.0.0.1 Taints: <none> CreationTimestamp: Fri, 27 Aug 2021 00:07:09 +0800 Phase: Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False Sun, 29 Aug 2021 23:05:53 +0800 Sat, 28 Aug 2021 00:30:35 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Sun, 29 Aug 2021 23:05:53 +0800 Fri, 27 Aug 2021 00:07:09 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Sun, 29 Aug 2021 23:05:53 +0800 Fri, 27 Aug 2021 00:07:09 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Sun, 29 Aug 2021 23:05:53 +0800 Sat, 28 Aug 2021 00:30:35 +0800 KubeletReady kubelet is posting ready status Addresses: 127.0.0.1,127.0.0.1,127.0.0.1 Capacity: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 1 memory: 1882012Ki pods: 110 Allocatable: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 1 memory: 1882012Ki pods: 110 System Info: Machine ID: 963c2c41b08343f7b063dddac6b2e486 System UUID: EB90EDC4-404C-410B-800F-3C65816C0E2D Boot ID: 4a9349b0-ce4b-4b4a-8766-c5c4256bb80b Kernel Version: 3.10.0-1160.15.2.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.13.1 Kubelet Version: v1.5.2 Kube-Proxy Version: v1.5.2 ExternalID: 127.0.0.1 Non-terminated Pods: (3 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default mysql-2cpt9 0 (0%) 0 (0%) 0 (0%) 0 (0%) default myweb-53r32 0 (0%) 0 (0%) 0 (0%) 0 (0%) default myweb-609w4 0 (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted. CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 0 (0%) 0 (0%) 0 (0%) 0 (0%) Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4h 27m 3 {kubelet 127.0.0.1} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. pod: "myweb-609w4_default(01d719dd-08b1-11ec-9d6a-00163e1220cb)" . Falling back to DNSDefault policy. 25m 25m 1 {kubelet 127.0.0.1} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. pod: "mysql-2cpt9_default(1c9353ba-08d7-11ec-9d6a-00163e1220cb)" . Falling back to DNSDefault policy.
查看容器日志 在需要排查容器内部应用程序生成的日志时,我们可以使用kubectl logs <pod_name>
命令
这里打印etcd
数据库的日志信息
,查看日志中异常的相关信息,这里用过过滤error
关键字的方法来查看相关的信息
1 2 3 4 5 6 7 8 9 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl logs etcd-vms81.liruilongs.github.io -n kube-system | grep -i error | head -5 {"level" :"info" ,"ts" :"2022-01-25T13:54:33.191Z" ,"caller" :"wal/repair.go:96" ,"msg" :"repaired" ,"path" :"/var/lib/etcd/member/wal/0000000000000014-0000000000185aba.wal" ,"error" :"unexpected EOF" } {"level" :"info" ,"ts" :"2022-01-25T13:54:33.192Z" ,"caller" :"etcdserver/storage.go:109" ,"msg" :"repaired WAL" ,"error" :"unexpected EOF" } {"level" :"warn" ,"ts" :"2022-01-25T13:54:33.884Z" ,"caller" :"embed/config_logging.go:169" ,"msg" :"rejected connection" ,"remote-addr" :"127.0.0.1:53950" ,"server-name" :"" ,"error" :"EOF" } {"level" :"warn" ,"ts" :"2022-01-25T13:54:33.885Z" ,"caller" :"embed/config_logging.go:169" ,"msg" :"rejected connection" ,"remote-addr" :"127.0.0.1:53948" ,"server-name" :"" ,"error" :"EOF" } {"level" :"warn" ,"ts" :"2022-01-28T03:00:37.549Z" ,"caller" :"etcdserver/util.go:166" ,"msg" :"apply request took too long" ,"took" :"628.230855ms" ,"expected-duration" :"100ms" ,"prefix" :"read-only range " ,"request" :"key:\"/registry/runtimeclasses/\" range_end:\"/registry/runtimeclasses0\" count_only:true " ,"response" :"" ,"error" :"context canceled" } ┌──[root@vms81.liruilongs.github.io]-[~] └─$
查看Kubernetes服务日志 如果在Linux
系统上安装Kubernetes
,并且使用systemd
系统管理Kubernetes
服务,那么systemd
的journal
系统会接管服务程序的输出日志。在这种环境中,可以通过使用systemd status
或journalct
具来查看系统服务的日志。例如:
查看服务服务启动的相关信息,通过这个,可以定位服务加载的配置文件
信息,启动参数配置情况
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ┌──[root@vms81.liruilongs.github.io]-[~] └─$systemctl status kubelet.service -l ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since 二 2022-01-25 21:53:35 CST; 6 days ago Docs: https://kubernetes.io/docs/ Main PID: 1014 (kubelet) Memory: 208.2M CGroup: /system.slice/kubelet.service └─1014 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.5 2月 01 17:47:14 vms81.liruilongs.github.io kubelet[1014]: W0201 17:47:14.258523 1014 container.go:586] Failed to update stats for container "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1b874bfdef201d69db10b200b8f47d5.slice/docker-c20fa960cfebd38172e123a5d87ecd499518bf22381f7aaa62d57131e7eb1aae.scope" : unable to determine device info for dir: /var/lib/docker/overlay2/07d7695f2c479fbd0b654016345fcbacd0838276fb57f8291f993ed6799fae8d/diff: stat failed on /var/lib/docker/overlay2/07d7695f2c479fbd0b654016345fcbacd0838276fb57f8291f993ed6799fae8d/diff with error: no such file or directory, continuing to push stats 。。。。。。。。。。
通过 journalct
来查看相关的服务日志信息,查看当前用户下的kubelet服务日志中有error关键字的字段的报错问题
1 2 3 4 5 6 ┌──[root@vms81.liruilongs.github.io]-[~] └─$journalctl -u kubelet.service | grep -i error | head -2 1月 25 21:53:55 vms81.liruilongs.github.io kubelet[1014]: I0125 21:53:55.865441 1014 docker_service.go:264] "Docker Info" dockerInfo=&{ID:HN3K:C6LG:QGV7:N2CG:VELF:CJ6T:HFR5:EEKH:HLPO:CDEU:GN3E:QAJJ Containers:32 ContainersRunning:11 ContainersPaused:0 ContainersStopped:21 Images:32 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true ] [Native Overlay Diff true ] [userxattr false ]] SystemStatus:[] Plugins:{Volume:[local ] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:26 OomKillDisable:true NGoroutines:39 SystemTime:2022-01-25T21:53:55.833509372+08:00 LoggingDriver:json-file CgroupDriver:systemd CgroupVersion:1 NEventsListener:0 KernelVersion:3.10.0-693.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSVersion:7 OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000a8f960 NCPU:2 MemTotal:4126896128 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:vms81.liruilongs.github.io Labels:[] ExperimentalBuild:false ServerVersion:20.10.9 ClusterStore: ClusterAdvertise: Runtimes:map[io.containerd.runc.v2:{Path:runc Args:[] Shim:<nil>} io.containerd.runtime.v1.linux:{Path:runc Args:[] Shim:<nil>} runc:{Path:runc Args:[] Shim:<nil>}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:5b46e404f6b9f661a205e28d59c982d3634148f8 Expected:5b46e404f6b9f661a205e28d59c982d3634148f8} RuncCommit:{ID:v1.0.2-0-g52b36a2 Expected:v1.0.2-0-g52b36a2} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[] 1月 25 21:53:56 vms81.liruilongs.github.io kubelet[1014]: E0125 21:53:56.293100 1014 controller.go:144] failed to ensure lease exists, will retry in 200ms, error: Get "https://192.168.26.81:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/vms81.liruilongs.github.io?timeout=10s" : dial tcp 192.168.26.81:6443: connect: connection refused ┌──[root@vms81.liruilongs.github.io]-[~] └─$
如果不使用systemd
系统接管Kubernetes
服务的标准输出,则也可以通过日志相关的启动参数来指定日志的存放目录。当然,这里的相关启动参数的配置信息需要通过查看pod文件来查看
查看kube-controller-manager
的启动参数和认证相关的配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl describe pod kube-controller-manager-vms81.liruilongs.github.io -n kube-system | grep -i -A 20 command Command: kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-cidr=10.244.0.0/16 --cluster-name=kubernetes --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --port=0 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --use-service-account-credentials=true State: Running
1 2 3 4 5 6 7 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl describe pod kube-controller-manager-vms81.liruilongs.github.io -n kube-system | grep kubeconfig --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --kubeconfig=/etc/kubernetes/controller-manager.conf /etc/kubernetes/controller-manager.conf from kubeconfig (ro) kubeconfig:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl describe pod kube-controller-manager-vms81.liruilongs.github.io -n kube-system | grep -i -A 20 Volumes Volumes: ca-certs: Type: HostPath (bare host directory volume) Path: /etc/ssl/certs HostPathType: DirectoryOrCreate etc-pki: Type: HostPath (bare host directory volume) Path: /etc/pki HostPathType: DirectoryOrCreate flexvolume-dir: Type: HostPath (bare host directory volume) Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec HostPathType: DirectoryOrCreate k8s-certs: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/pki HostPathType: DirectoryOrCreate kubeconfig: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/controller-manager.conf HostPathType: FileOrCreate ┌──[root@vms81.liruilongs.github.io]-[~] └─$
Pod资源对象相关的问题 ,比如无法创建Pod
, Pod
启动后就停止或者Pod
副本无法增加,等等。此时,可以先确定Pod
在哪个节点上,然后登录这个节点,从kubelet
的日志中查询该Pod
的完整日志,然后进行问题排查。
对于与Pod扩容相关或者与RC相关的问题 ,则很可能在kube-controller-manager
及kube-scheduler
的日志中找出问题的关键点。
1 2 3 4 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl logs kube-scheduler-vms81.liruilongs.github.io ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl logs kube-controller-manager-vms81.liruilongs.github.io
kube-proxy
经常被我们忽视,因为即使它意外停止 , Pod
的状态也是正常的,但会导致某些服务访问异常。这些错误通常与每个节点上的kube-proxy
服务有着密切的关系。遇到这些问题时,首先要排查kube-proxy
服务的日志,同时排查防火墙服务,要特别留意在防火墙中是否有人为添加的可疑规则。
1 2 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl logs kube-proxy-tbwz5
常见问题 由于无法下载pause镜像导致Pod一直处于Pending状态 Pod创建成功,但RESTARTS数量持续增加 :容器的启动命令不能保持在前台运行。通过服务名无法访问服务 在Kubernetes
集群中应尽量使用服务名访问正在运行的微服务,但有时会访问失败。由于服务涉及服务名的DNS域名解析
、kube-proxy组件的负载分发
、后端Pod列表的状态
等,所以可通过以下几方面排查问题。
1.查看Service
的后端Endpoint
是否正常
可以通过kubectl get endpoints <service name>
命令查看某个服务的后端Endpoint
列表,如果列表为空,则可能因为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 50d liruilong-kube-prometheus-kubelet ClusterIP None <none> 10250/TCP,10255/TCP,4194/TCP 16d metrics-server ClusterIP 10.111.104.173 <none> 443/TCP 50d ┌──[root@vms81.liruilongs.github.io]-[~] └─$kubectl get endpoints NAME ENDPOINTS AGE kube-dns 10.244.88.66:53,10.244.88.67:53,10.244.88.66:53 + 3 more... 50d liruilong-kube-prometheus-kubelet 192.168.26.81:10250,192.168.26.82:10250,192.168.26.83:10250 + 6 more... 16d metrics-server <none> 50d ┌──[root@vms81.liruilongs.github.io]-[~] └─$
Service
的Label Selector
与Pod的Label不匹配
,沒有相关的pod可以提供能力
后端Pod
一直没有达到Ready
状态(通过kubectl get pods
进一步查看Pod的状态
)
**Service的targetPort端口号与Pod的containerPort不一致等 **。即容器暴露的端口不是SVC暴露的端口,需要使用targetPort来转发
2·查看Service的名称能否被正确解析为ClusterIP地址
可以通过在客户端容器中ping ..svc进行检查,如果能够得到Service
的ClusterlP
地址,则说明DNS服务
能够正确解析Service
的名称;如果不能得到Service
的ClusterlP地址
,则可能是因为Kubernetes集群
的DNS服务工作异常
。
3·查看kube-proxy
的转发规则
是否正确
我们可以将kube-proxy
服务设置为IPVS或iptables负载分发模式
。
对于IPVS负载分发模式
,可以通过ipvsadm
工具查看Node上的IPVS规则
,查看是否正确设置Service ClusterlP
的相关规则。
对于iptables负载分发模式
,可以通过查看Node上的iptables规则
,查看是否正确设置Service ClusterlP
的相关规则。
寻求帮助